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Abstract. We consider the problem of selecting sequentially a unimodal sub- 
sequence from a sequence of independent identically distributed random vari- 
ables, and we find that a person doing optimal sequential selection does within 
a factor of the square root of two as well as a prophet who knows all of the ran- 
dom observations in advance of any selections. Our analysis applies in fact to 
selections of subsequences that have d+1 monotone blocks, and, by including 
the case d = 0, our analysis also covers monotone subsequences. 
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1. Introduction 

A classic result of Erdos and Szekeres (1935) tells us that in any sequence 
xi,X2, ■ ■ ■ ,Xn of n real numbers there is a subsequence of length k = \n^^'^~\ 
that is either monotone increasing or monotone decreasing. More precisely, given 
xi,X2, ■ ■ ■ ,Xn one can always find a subsequence 1 < ni < n2 < ■ ■ ■ < rik < n tor 
which we either have 

Many years later, Fan Chung (1980) considered the analogous problem for uni- 
modal sequences. Specifically, she sought to determine the maximum value such 
that in any sequence of n real values xi, a:2, . . . , a;„ one can find a subsequence 
Xi-^ , Xi^ , • ■ • , Xi^^ of length k — and a "turning place" 1 < t < k for which one 
either has 

•^ii — •^12 — ' ' ' — -^it — -^it+l — ■ ' ' — ' 
•^ii 2^ '^12 ^ ' ' ' ^ "^it — "^it + l — ' ' ' — • 

Through a sustained and instructive analysis, she surprisingly obtained an exact 
formula: 



(3n- 3/4)1/2 -1/2 

Shortly afterwards, Steele (1981) considered unimodal subsequences of permuta- 
tions, or equivalently, unimodal subsequences of a sequence of n independent, uni- 
formly distributed random variables Xi , X2 , ■ • ■ , Xn . For the random variables 

Un = max{fc : Xi^ < X.^^ < ■ ■ ■ < Xi^ > X^^^^ > ■ ■ ■ > Xi^, where 

I < ii < 12 < ■ ■ ■ < ik < n}, 
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and 

Dn = max{fc : > Xi^ > ■ ■ ■ > Xi^ < Xi^_^^ < ■ ■ <Xi^, where 
1 < ii < ^2 < • • • < ife < n}, 

it was established that 

(1) E [max{C/„, £)„}] ~ ]£[[/„] ~ ]E[Z)„] ~ 2(2n)^/^ as n -s- oo. 

Here we consider analogs of the random variables ?7„, and i„ = max{?7„, 
but instead of seeing the whole sequence all at once, one observes the variables 
sequentially. Thus, for each 1 < i < n, the chooser must decide at time i when 
Xi is first presented whether to accept or reject Xi as an element of the unimodal 
subsequence. The sequential (or on-line) selection for the much simpler problem of 
a monotone subsequence — the analog of the original Erdos and Szekeres (1935) 
problem — was considered long ago in Samuels and Steele (1981). 

Main Results. We denote by n(n) the set of all feasible policies for the unimodal 

sequential selection problem for {Xi, X2., ■ ■ ■ ,Xn\ where these random variables 
are independent with a common continuous distribution function F. Given any 
feasible sequential selection policy 7r„ e n(n), if we let r/u denote the index of the 
fc'th selected element, then for each k the value r^. is a stopping time with respect 
to the increasing sequence of cr-fields Ti = (j{Xi,X2, 1 < i < n. In terms 

of these stopping times, the random variable 

U^{nn) = max{k : Xr, < Xr^ < ■ ■ ■ < Xr, > Xr,+, > - >Xr^, where 

1 < Tl < T2 < • • • < Tfc < n}, 

is the length of the unimodal subsequence that is selected by the policy 7r„. For 
the moment, we jiist consider unimodal subsequences that begin with an increasing 
piece and end with a decreasing piece; either of these pieces is permitted to have 
size one. 

For each n there is a policy tt* G n(n) that maximizes the expected length of the 
selected subsequence, and the main issue is to determine the asymptotic behavior 
of this expected value. The answer turns out to have an informative relationship 
to the off-line selection problem. A prophet with knowledge of the whole sequence 
before making his choices will do better than an optimal on-line chooser, but he 
will only do better by a factor of \/2. 

Theorem 1 (Expected Length of Optimal Unimodal Subsequences). For each 
n> 1, there is a tt* € n(n), such that 

nu:i<)]= sup E[U^{nn)], 

7r„en(n) 

and for such an optimal policy one has the upper bound 

E[[7°«)] < 2nV2 

and the lower bound 

2nV2 - 4(7r/6)V2nV4 _ 0(1) < E[C/°«)] 
which combine to give the asymptotic formula 

^U°{n^)] ~ 2n^/2 as 00. 
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In a natural sense that we will shortly make precise, the optimal policy tt* is 
unique. Consequently, one can ask about the distribution of the length J7°(7r*) 
of the subsequence that is selected by the optimal policy, and there is a pleasingly 
general argument that gives an upper bound for the variance. Moreover, that bound 
is good enough to provide a weak law for U°{n*^). 

Theorem 2 (Variance Bound). For the unique optimal policy tt* € n(n), one has 
the bounds 

(2) Var[[/°«)] < E[t/°«)] < 2^1/2. 

Corollary 1 (Weak Law for Unimodal Sequential Selections). For the sequence of 
optimal policies tt* G n(n), one has the limit 

[/°(7r*)/-\/n — ^ 2 as n —> oo. 

Organization of the Proofs. 

The proof of Theorem [1] comes in two halves. First, we show by an elaboration of 
an argument of Gncdin (1999) that there is an a priori upper bound for E[J7°(7r„)] 
for all n and all 7r„ G 11 (n). This argument uses almost nothing about the structure 
of the selection policy beyond the fact from Section |4] that it suffices to consider 
policies that are specified by acceptance intervals. For the lower bound we simply 
construct a good (but suboptimal) policy. Here there is an obvious candidate, but 
the proof of its efficacy seems to be more delicate than one might have expected. 

The proof of Theorem [2] in Section [3] exploits a martingale that comes natu- 
rally from the Bellman equation. The summands of the quadratic variation of this 
martingale are then found to have a fortunate relationship to the probability that 
an observation is selected. It is this "self-bounding" feature that leads one to the 
bound ([2]) of the variance by the mean. 

In Section [5] we outline analogs of Theorems [T] and [5] for subsequences that can 
be decomposed into d -I- 1 alternating monotone blocks (rather than just two). If 
one takes d — 0, this reduces to the monotone subsequence problem, and in this 
case only the variance bound is new. Finally, in Section |5] we comment briefly on 
two conjectures. These deal with a more refined understanding of Var[[/°(7r* )] and 
with the naturally associated central limit theorem. 

2. Mean Bounds: Proof of Theorem [T] 

Since the distribution F is assumed to be continuous and since the problem is 
unchanged by replacing Xi by its monotone transformation F^^(Xi), we can assume 
without loss of generality that the Xi are uniformly distributed on [0, 1]. Next, we 
introduce two tracking variables. First, we let Si denote the value of the last 
element that has been selected up to and including time i. We then let Ri denote 
an indicator variable that tracks the monotonicity of the selected subsequence; 
specifically we set i?^ = if the selections made up to and including time i are 
increasing; otherwise we set Ri — I. 

The sequence of real values {Si : Ri — 0, 1 < i < n} is thus a monotone 
increasing sequence, though of course not in the strict sense because there will 
typically be long patches where the successive values of Si do not change. Similarly, 
{Si : Ri ^ 1, 1 < i < n} is monotone decreasing sequence, and the full sequence 
{Si : 1 < i < n} is a unimodal sequence — in the non-strict sense that permits 
"fiat spots." As a convenience for later formulas, we also set So = and Rq = 0. 
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The Class of Feasible Interval Policies. Here we will consider feasible policies 
that have acceptance sets that are given by intervals. It is reasonably obvious that 
any optimal policy must have this structure, but for completeness we give a formal 
proof of this fact in Section |4l 

Now, if the value Xi is under consideration for selection, two possible scenarios 
can occur: if = (so one is in the "increasing part" of the selected subse- 

quence) then a selectable Xi can be above or below St-i. On the other hand, if 
Ri-i ~ 1 (and one is in the "decreasing part" of the selected subsequence), then 
any selectable Xi has to be smaller than Si-i. Thus, to specify a feasible interval 
policy, we just need to specify for each i an interval [a, b] C [0, 1] where we accept 
Xi if Xi e [a, b] and we reject it otherwise. Here, the values of the end-points of the 
interval are functions of i, S'i-i, and Ri-i. In longhand, we write the acceptance 
interval as 

Ai(S'i_i, i?i_i) = [a{i, Si^i, Ri-i), fe(i, i?i_i)]. 

There are some restrictions on the functions a{i, S'i-i, i?i_i) and b{i, Si-i, Ri-i). 
To make these explicit we consider two sets of functions, A and B. We say a £ A 
provided that a : {1, 2, n} x [0, 1] x {0, 1} -J> [0, 1] and 

< a{i, s,r) < s for all s e [0, 1], r G {0, 1} and 1 < i < n. 

Similarly, we say b e B provided that 6 : {1, 2, n} x [0, 1] x {0, 1} — > [0, 1] and 

s < b{i, s, 0) < 1 for all s G [0, 1] and 1 < i < n; 

< b{i, s, 1) = s for all s G [0, 1] and 1 < i < n. 

Together a pair {a,b) Cz Ax B defines an interval policy 7r„ G H(n) where we accept 
Xi at time i if and only if Xi G Ai{Si-i, Ri-i). We let H'(n) denote the set of 
feasible interval policies. 

Three Representations. First wc note that for Si we have a simple update rule 
driven by whether Xi is rejected or accepted: 




Si-i if Xi ^ Ai{Si-i, Ri-i) 
X, ifX, G A,(^.-i,-R»_i). 



For the sequence {Ri} the update rule is initialized by setting Rq — 0; one should 
then note that only one change takes place in the values of the sequence {Ri}. 
Specifically, we change to i?^ = 1 at the first i such that Si < Si-i, i.e. the first 
instance where we have a decrease in our sequence of selected values. For specificity, 
we can rewrite this rule as 

r 1 ifX, G A,(5,_i,i?,_i) 
j-g-j i? — < ^^'^ ~ maxjS'fe : 1 < k < i} 

[_ Ri-i otherwise. 

Finally, using 1{E) to denote the indicator function of the event E, we see by 
counting the occurrences of the "selection events" Xi G Ai{Si-i, Ri-i), that for 
each 1 < A: < n the number of selections made up to and including time k is given 
by the sum of the indicators 

k 

(4) U^ilTn) = ^ 1 {X, G A,(5,_i,i?,_i)) . 

1=1 
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Proof of the Upper Bound (An a priori Prophet Inequahty). The imme- 
diate task is to show that for all rt > 1 and all 7r„ £ n'(7i), one has the inequality 

(5) nKM] < 2ni/2. 

It will then follow from Proposition [T] in Section |4] that the bound ([5]) holds for all 
7r„ e n(n). We start with the representation (j4]) and then after two applications of 
the Cauchy-Schwarz inequality we have 

n 

1=1 



1/2 



The target bound (O is therefore an immediate consequence of the following — 
curiously general — lemma. 

Lemma 1 (Telescoping Bound). For each n > 1 and for any strategy 7r„ G Il'{n), 
one has the inequality 



n 

(6) ^E [{b{i,S,-i,R,-i)-a{i,S,-i,R,-i)y 



< 4. 



i=l 

Proof. We first introduce a bookkeeping function g : [0, 1] x {0, 1} — [0, 2] by setting 



9{s,r) = 



s, if r = 

2 - s, if r = 1. 



Trivially g is bounded by 2, and we will argue by conditioning and telescoping that 
the left side of inequality ^ is bounded above by 2E [g(S'„, i?„)] < 4. Specifically, 
if we condition on J-i-i, then the independence and uniform distribution of Xi gives 
us, after a few lines of straightforward calculation, that 

E[g{S^,R)-g{S^-i,0) \ J^^-l] 

= / {g{x,l)~S.,-i)dx+ {g{x,0)-S,-i)dx 

Jq(i, 5,-1,0) "'s.-i 
1 2 

= - (6(i, 5*1-1, 0) - a{i, 0)) 

+ - a(z, 5,-1, 0)) (2 - - b{i, 5,_i, 0)) . 

Since last summand is non-negative we have the tidier bound 

(7) (6(^,5,_l,0)-a(^,5,_l,0))' < 2 E[5(5„i?,)- 3(5,-1,0) | 

By an analogous direct calculation one also has the identity 
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(8) E[.g(5„l)-<?(5.-i,l) I = 



a{i,Si-i,l) 



Since i?,;-i ~ 1 implies Ri = 1, we can write g{Si,Ri) — g{Si^i, Ri^i) as the sum 

R,) - 0)}l(i?,_i - 0) + {giS,, 1) - = 1), 

so the two bounds ([7]) and ([8]) give us the key estimate 

[bii, 5,_i, - a(i, < 2 E[giS,, R,) - | 

Finally, when we take the total expectation and sum, one sees that telescoping gives 



2 = 1 



just as needed. 



□ 



Proof of the Lower Bound (Exploitation of Suboptimality). We construct 
an explicit policy 7r„ e n(n) that is close enough to optimal to give us the bound 

(9) 2ni/2 _ 4(^6)1/2^1/4 - 0(1) < E[C/°«)]. 

The basic idea is to make an approximately optimal choice of an increasing subse- 
quence from the sample {Xi : 1 < i < n/2\ and an approximately optimal choice 
of a decreasing subsequence from the sample {Xi : ?i/2+l < i < n}. The cost of 
giving up a flexible choice of the "turn-around time" is substantial, but this class 
of policies is still close enough to optimal to give required bound (O. 

For the moment, we assume that n is even. We then select observations according 
to the following process: 

• For l<z<n/2we select the observation Xi if and only if Xi falls in the 
interval between Si-i and min{l, Si-i + 2n^^/^}. 

• We set Sn/2 = 1 and for n/2-|-l < i < n we select the observation Xi if and 
only if Xi falls in the interval between max{0, Si-i — 2n~^^^} and Si-i. 

Here, of course, the selections for 1 < i < n/2 are increasing and the selections for 
n/2 + 1 < i < n are decreasing, so the selected subsequence is indeed unimodal. 
We then consider the stopping time 

f = min{i : Si > 1 ~ 2n^^l'^ or i > n/2}, 

and we note that the representation the suboptimality of the policy ??„, and 
the symmetry between our policy onl<j<n/2 and onn/2-|-l<i<n will give 
us the lower bound 



(10) 



2E 



.4=1 



l(x,e [5,_i,5,_i+2n-i/2] 



< E[C/°(^„)] < E[C/°«)]. 



Wald's Lemma now tells us that 



E 



1/ 
.4=1 



= 2n-^^^E[iy], 



so we have 



4n-i/2EM<E[C/„°«)]. 
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The main task is to estimate E[z/]. It is a small but bothersome point that the sum- 
mands 1 [Xi e [S'i-i, S'i-i + 2n^^/^]) are not i.i.d. over the entirety of the range 
i € [l,n/2]; the distribution of the last terms differ from that of the predecessors. 
To deal with this nuisance, we take Zj, 1 < j < oo, to be a sequence of random 
variables defined by setting 

w.p. 1 - 2n-i/2 
Uj w.p. 2n-i/2, 

where the C/j's are independent and uniformly distributed on [0,2n~^^'^]. Easy 
calculations now give us for all 1 < j < oo that 

(11) EZ, = Var[Z,-] = ^"'f 7 < and|Z,-EZ,|< ^ 



Next, if we set 5'o ^ and put 

i 

Si = ^Zj, for 1 < i < n, 

for 1 < i < i^, we have Si ^ Si. Setting i/ = min{i : Si > I — 2n^^/^ or i > n/2} 
we also have ly ^ so to estimate E[j/] it then suffices to estimate 

n/2-l n/2-1 ?i/2-l 

E[I7] = ^ P > = ^ P (S, < 1 - 2n-i/2^ = - - I] P (5, > 1 - 2n-i/2 
The proof of the lower bound © will then be complete once we check that 

n/2-l 

(12) ^{S^>^- 2ri-'/') < {TT/6)'/'n'/^ + [7.1/2] 

i=Q 

This bound turns out to be a reasonably easy consequence of Bernstein's inequality 
(c.f., Lugosi, 2009, Theorem 6) which asserts that for any i.i.d sequence {Zj} with 
the almost sure bound | Zj — KZj \ < M one has for alH > that 

t - > ) ^ {- 2.Var[Z,K2Mt/3 } ' 

If we set n* — [n/2 — n^/^ — IJ , then Bernstein's inequality together with the bounds 
(ITT]) and some simplification will give us 

n/2-l n* 

P (5, > 1 - 2n-i/2) < [ni/^l + 5] P (5, > 1 - 2n-i/2 

1=0 i=0 



< [n^/^l + ^ exp ■ 



i=0 



3 (-2i- 2^1/2 + ^)^ 
8n (71I/2 _ 1) 



The summands are increasing, so the sum is bounded by 

du=(2/3)l/2(„3/2_^)l/2 / • ).-u^du, 



exp ■ 



8n (ni/2 _ 1) 
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where a{n) = (3/8)^/^ (n^/^ - 2) (n^/^ _ i)-i/2, Upon bounding the last integral 
by 7r^/^/2, one then completes the proof of the target bound ((HI). Finally, we note 
that if n is odd, one can simply ignore the last observation at the cost of decreasing 
our lower bound by at most one. 

Remark. A benefit of Bernstein's inequality (and the slightly sharper Bennett 
inequality) is that one gets to take advantage of the good bound on Var[Zj]. The 
workhorse Hoeffding inequality would be blind to this useful information. 

3. Variance Bound: Proof of Theorem [2] 

To prove the variance bound in Theorem [2] we need some of the machinery of the 
Bellman equation and dynamic programming. To introduce the classical backward 
induction, we first set Vi{s, r) equal to the expected length of the longest unimodal 
subsequence of {Xi,Xi+i, . . . ,X„} that is obtained by sequential selection when 
Si-i = s and Ri-i — r. We then have the "terminal conditions" 

w„(s, 0) — 1, Wn(s, 1) = s, for aU s £ [0, 1] 

and we set 

Vn+i{s,r) = for aU s e [0, 1] and r e {0, 1}. 
For 1 < i < n — 1 we have the Bellman equation: 

max{ui_|.i(s, 0), 1 + Vi+i{x, 1)} dx if r = 

+ max{ui+i(s, 0), 1 + Ui+i(a;, 0)} dx 

(13) w,(s,r) = <( 

(1 - s)ui+i(s, 1) ifr = l 

, + Jg max{uj+i(s, 1), 1 + Ui+i(a;, 1)} dx. 

One should note that the map s Vi(s, 0) is continuous and strictly decreasing on 
[0, 1] for 1 < i < n — 1 with t;„(s,0) = 1 for all s E [0, 1]. In addition, the map 
s I— >■ Vi{s, 1) is continuous and strictly increasing on [0, 1] for all 1 < i < n. 
If we now define a* : {1, 2, . . . , n} x [0, 1] x {0, 1} -> [0, 1] by setting 

(14) a*{i, s,r) = inf {x e [0,s] : Ui+i(s, r) < 1 + Vi+i{x, 1)} , 

then we have a* G A. Similarly, if we define b* : {1, 2, . . . , n} x [0, 1] x {0, 1} — > [0, 1] 
by setting 

sup {x € [s, 1] : Vi+i (s, 0) < 1 + Vi+i {x,0)} if r = 0. 
s if r = 1. 



(15) b*{i,s,r) 



then we have b* e B. Here, a*{i, s, r) and b*{i, s, r) are state-dependent thresholds 
for which one is indifferent between (i) selecting the current observation x, adjusting 
r to r' as in (|3]), and continuing to act optimally with new state pair (a;, r'), or (ii) 
rejecting the current observation, x, and continuing to act optimally with unchanged 
state pair, (s, r). 

By the Bellman equation (jl3|) and the continuity and monotonicity properties 
of the value function, the values a* and b* provide us with a unique acceptance 
interval for all 1 < i < n and all pairs {s,r). The policy tt* associated with a* and 
b* then accepts Xi at time 1 < i < n ii and only if 

X, G A*(5,„i,i?,_i) EE [a*(z,S,_i,i?,_i), fe*(z,5,;_i,i?,_i)], 

where, as in Section [21 Si-i is the value of the last observation selected up to 
and including time i — 1, and Ri-i tracks the direction of the monotonicity of the 
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subsequence selected up to and including time i — 1. In Section |4] we will prove 
that this policy is indeed the unique optimal policy for the sequential selection of 
a unimodal subsequence. 

We do not need a detailed analysis of a* and b* , but it is useful to collect some 
facts. In particular, one should note that a*(i,s, r) = whenever Ui+i(s,r) < 1 
and 6*(z,s, 0) = 1 whenever Ui_|_i(s,0) < 1. In addition, the difference b*(i,s,r) — 
o* («, s, r) provides us with an explicit bound on the increments of the value function 
Vi{s,r), as the following lemma suggests. 

Lemma 2. For all s E [0,1], re {0, 1} and 1 < i < n, we have 

(16) < Vi{s, r) — Vi+i{s, r) < b*{i, s, r) — a*{i, s, r) < 1. 

Proof. The lower bound is trivial and it follows by the fact that Vi{s,r) is strictly 
decreasing in i for each (s, r) £ [0, 1] x {0, 1}. 

For the upper bound, we first assume that r — 0. Then, subtracting Ui+i(s,0) 
on both sides of equation (|13|) when r — and using the definition of a* and b* , we 
obtain 

Vi{s,0) - Vi+i{s,0) = -{b*{i,s,r) - a*{i,s,r))v^+i{s,0) 

ps nb* {i,s.r) 

+ / {l + v,+i{x,l))dx+ / {l + v,+i{x,0))dx. 

J a* (i.s,r) J s 

Recalling the monotonicity property of s i— )• Wi+i(s,r), we then have 
Vi{s,0) ~ Ui+i(s, 0) < -{b*{i,s,r) - a*{i,s,r))vi+i{s,0) 

+ {s- a*{i, s, r))(l + v^+i{s, 1)) + (6*(i, s, r) - s)(l + v^+i{s, 0)), 
and since Vi+i{s, 1) < Vi+i{s, 0), we finally obtain 

Vi{s,0) - w.i+i(s, 0) < b*{i,s,r) - a*{i,s,r) < 1, 
as (|16|) requires. The proof for r = 1 is very similar and it is therefore omitted. □ 

We now come to the main lemma of this section. 
Lemma 3. The process defined by 

= C/°«) + i?,) for aUO<t<n, 

is a martingale with respect to the natural filtration {J-^i}o<i<n- Moreover, for the 
martingale difference sequence di — Yi ~ l^i-i one has that 

\d,\ ^\Y,- Y,^i I < 1 for all 1 < i < n. 

Proof. We first note that Yi is J^i-measurable and bounded. Then, from the defini- 
tion of v,{s, r) we have that w,(S',_i, = E [U°{n;j - C/°_i(7r;) | . Thus, 

= U°{<) + E [KiO - u°{K) I -F,] = E I J-,] , 

which is clearly a martingale. 

To see that the martingale differences are bounded let 

Wi = Vi+i[Si-i,Ri-i) - Vi{Si-i,Ri^i) 

represents the change in Yi if we do not select Xi, and let 
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represents the change when we do select Xi. We then have that 

= VF, + Z„ 

and by our Lemma [2] we know that — 1 < Wi < 0. Moreover, the definition of the 
threshold functions a* and b* and the monotonicity property of s i— >■ Wi+i(s,r) give 
us that < Z,; < 1, so that \di\ < 1, as desired. □ 

Final Argument for the Variance Bound. For the martingale differences di — 
Yi — Yi_i we have 



Yn-Yo=Y,d^, and Var[y„] = E 



1=1 



and we also have the initial representation 

^0 = USiK) + viiSo^Ro) = i'i(0, 0) = E[C/°«)] 
and the terminal identity 

= KiO + t'n+llS™, i?„) = [/°«). 

We now recall the decomposition di — Wi + Zi introduced in the proof of Lemma 
[3l where 



Wi = Vi+i{Si^i,Ri-i) ~ Vi{Si^i,Ri-i) 



and 



Since Wi is measurable, we have 

E I = E [Zf I +2W,E [Z, I + 14^2. 

We also have = E [d, | J^.^i] ^W.,+E [Zi \ Fi-i] so 
(17) E [d^ I = E [Zf I - Wf. 

Finally, from the definition of Z^, a* and 6* we obtain 



E [Zf I = 



6*(i,S,_i,it,_i) 



(1 + Vi+i{x, l{x < Si-i)) - Vi+i{Si-i,Ri-i)) dx 

< b*{i,Si-i,Ri-i) - a*{i,Si-i,Ri^i), 

since the integrand is bounded by 1. Summing ([T7| . applying the last bound, and 
taking expectations gives us 

Var[{/°(7r:)] <^E[6*(*,5,_i,i?,_i)-a*(z,5,_i,i?,_i)] -E[C/°«)], 

i=l 

where the last equality follows from our basic representation Q. 



ON-LINE UNIMODAL SELECTION 



11 



4. Intermezzo: Optimality and Uniqueness of Interval Policies 

The uniniodal sequential selection problem is a finite horizon Markov decision 
problem with bounded rewards and finite action space, and for such a problem it 
is known that there exists a non-randomized Markov policy tt* that is optimal (c.f. 
Bertsekas and Shreve, 1978, Corollary 8.5.1). This amounts to saying that there 
exists an optimal strategy tt* such that for each z, Si^i and Ri-i, there is a Borel set 
D*{Si-i,Ri-i) C [0, 1] such that is accepted if and only ii Xi g D* {Si-i, Ri-i). 
Here we just what to show that the Borel sets D*{Si-i, Ri-i) are actually intervals 
(up to null sets). 

Given the optimal acceptance sets D*(Si-i, i?i_i), 1 < i < n, we now set 



Vi{S^-i,Ri-i) = E 



J2HXkeDUSk-i,Rk-i)) l^z-i 



_k—i 



SO we have the recursion 

(18) v,{S,-i,R,^i)^E[l{X,eD*{S^-i,R,-i))+v,+i{S,,R,) \T,^i] , 

and Vi{s, r) is just the optimal expected number of selections made from the sub- 
sample {Xi, Xi^i, . . . ,X„} given that Si-i = s and Ri-i = r. We then note that 
Vn{s,0) = 1 for all s € [0, 1], and one can check by induction on i that the map 
s I— > Vi{s, 0) is continuous and strictly decreasing in s for l<i<n — 1. A similar 
argument also gives that the map s i— Ui(s, 1) is continuous and strictly increasing 
in s for all 1 < i < n. 
If we now set 

a{i, Sj-i, i?j_i) = cssinf A(<5'i-i, i?i-i) and 
b{i, St^i,Ri-i) = esssup A(<5'i-i, i?i-i), 

then we want to show for all 1 < « < n and all (S'i_i, Ri-i) that we have 

P({A(S,_i, i?,_i)^ n [ait, i?,_i), b{i, i?,_i)]}) = 0. 

To argue by contradiction, we suppose that there is an 1 < i < n and an 
acceptance set D* = D*{Si-i, Ri-i) that is not equivalent to an interval; i.e. we 
suppose 

(19) ¥{{D*'n[a*{^,S,-l,R,-l),b*{i,S,-uR^-l)]})>0■ 
We then consider the sets 

L, = [0,S,-i]nD* and [/, = 1] n D*, 

and we introduce the intervals 

Li ^ [Si^i - \Li\, Si-^i] and Ui ^ [Si^i, S.^^i + \Ui\], 

where |^| denotes the Lebesgue measure of a set A. The set Di — LiU Ui is also 
an interval and \Di \ — \D*\, so, if we can show that 

(20) E[1(X, e D*) + v,+^{S„Ri)] <nHX, e bi)+v,+i{S,,Ri)l 

then the representation (jlSp tells us that policy tt* is not optimal, a contradiction. 
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To prove the bound ([20|) . we note that 



= E 



since A andL** are j;_i-measurable and E[l(Xi e A)|-^i-i] = E[l(Xi e A*)I-^j-i]- 
By our construction, we also have the identities 



(21) E v,+i{X„R,)l{X,<E D,) = v,+iix,l)dx+ v,+i{x,0)dx, 

L J JLi JUi 

and 

(22) ¥.[v,+i[X,,R,)l{X,eD*)\T,-i]= [ v,+i{x,l)dx+ [ v,+i{x,0) dx. 

JLi JUi 

Now since \Li\ — \Li\ implies that |Lj n ~ n L^|, we can write 



Vi+i{x, 1) dx - 



v,+i{x, 1) dx 



ViJ^i {x, 1) dx 



3+1(2:, 1) dx 



LiHL-^ 



'LiCtLl 

(23) = (A-a,)|Z,nL^|, 

where = ai(S'i_i, i?i_i), and /5i = Pi{Si-i, Ri-i) are chosen according to the 
mean value theorem for integrals. The sets Li n and Li n are almost surely 
disjoint since n C [Si^i — \Li\,Si-i] and Lj n C [Oj^i-i — So, we 

find that < (3i since 4-1(0;, 1) is strictly decreasing in x. 
A perfectly analogous argument tells us that we can write 



(24) 



Vi+i{x, 1) dx ■ 



v,+iix,i)dx^i6,-j,)\u,nun, 



where 7^ < Si and 7^ and Si depend on {Si-i, Ri-i). If we now set 

Ci{Si-i,Ri-i) = min{^i - ai,Si - 7;}, 
then the identities (PT|) and ([221) and the differences ([^5]) and (IM)) give us the bound 

c,(5,_i,i?,_i)|5,nA*1<E[i'^+i(^«,i?0i(-Xz e D,)-v,+iix,,R,)i{x, e 

Since Ci(S'i_i, i?i_i) > 0, the assumption (|19p implies that the left hand-side above 
is strictly positive. When we take total expectation we get 

< E [v,+i{X„R,)l{X, e A) - v,+iiX,,R,)l{X, e D* 

In view of the recursion (|T8)) , this contradicts the optimality of tt* . This completes 
the proof of (|20|) . and, in summary we have the following proposition. 

Proposition 1. //tt* is an optimal non-randomized Markov policy for the unimodal 
sequential selection problem, then, up to sets of measure zero, tt* is an interval 
policy. 

Corollary 2. There is a unique policy tt* G n(n) that is optimal. 

To prove the corollary one combines the optimality of the interval policy given 
by Proposition [1] with the monotonicity properties of the Bellman equation ([13]) . 
Specifically, the map s 1— Vi{s, 0) is strictly decreasing in s for all 1 < « < n — 1 and 
the map s 1— >■ Vi{s, 1) is strictly increasing in s for all 1 < i < n, so the equations 
([T4| and ([T5|) determine the values a*(-) and b*{-) uniquely. 
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5. Generalizations and Specializations: cJ-Modal Subsequences 

There are natural analogs of Theorems [T] and [2] for "d-modal subsequences," by 
which we mean subsequences that are allowed to make "d-turns" rather than just 
one. Equivalently these are subsequences that are the concatenation of (at most) 
d+1 monotone subsequences. If we let C/^''^(7r*) denote the analog of U°{tt^) when 
the selected subsequence is d-modal, then the arguments of the preceding sections 
may be adapted to provide information on the expected value of J7,°'''(7r*) and its 
variance. Here one should keep in mind that the case d = is not excepted; the 
arguments of the preceding sections do indeed apply to the selection of monotone 
subsequences. 

Theorems (Expected Length of Optimal d- Modal Subsequences) . //n(n) denotes 
the class of feasible policies for the d-modal subsequence selection problem, then 
there is a unique tt* G n(n) such that 

nK'>*n)]^ sup E[;7°''^(7r„)]. 

ir„en(n) 

Moreover, for all n > 1 and d > one has 

(25) c(d)i/2ni/2 - c(d)3/4(^/3)i/2„i/4 _ o{l) < EiK'^nl)] < c{df^^n'/^, 
where c(d) = 2{d + 1). In particular, one has 

HU°/iO] - {2(rf + l)}^/^n^/^ asn^oo. 

One should note that the case d = corresponds to the monotone subsequence se- 
lection problem studied by Samuels and Steele (1981) and more recently by Gnedin 
(1999). The monotone selection problem is also equivalent to certain bin packing 
problems studied by Bruss and Robertson (1991) and Rhee and Talagrand (1991). 

In the special case of d = 0, our upper bound (|25p agrees with that of Bruss and 
Robertson (1991) as well as with the result of Gnedin (1999). Our lower bound 
([25| on the mean for d = turns out to be slightly worse than that of Rhee and 
Talagrand's (1991) since our constant for the n^/^ term is 2^^'^{n/3y^'^ ^ 1.72, 
while theirs is 8^/"* ^ 1.68. 

For the d-modal problem, one can also prove the a variance bound that general- 
izes Theorem [2] in a natural way. 

Theorem 4 (Variance Bound for d-Modal Subsequences). For the unique optimal 
policy TT* e n(n) one has the bound 

Var[C/°''^«)]<E[C/°'''«)]. 

Chebyshev's inequality and Theorem |4] now combine as usual to provide a weak 
law for [/°^''«). Even for d = this variance bound is new. 

6. Two Conjectures 

Numerical studies for small d and moderate n, support the conjecture that one 
has the asymptotic relation 

(26) YaT[U^A<)] - InK-'^iO] asn^oo. 
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As observed by an anonymous reader, the methods of Section [3] and the concavity 
of the value function established in Samuels and Steele (1981) are in fact enough 
to prove an appropriate lower bound 



Here one should now be able to prove an upper bound on Var[[/°'''(7r* )] that is 
strong enough to establish the case d = of the conjecture p6|) . but confirmation 
of this has eluded us. 

Also, by numerical calculations of the optimal policy tt* and by subsequent 
simulations of U°''^{Tr^) for d = 0, d = 1, and modest values of n, it seems likely 
that the random variable U°''''{tt^) obeys a central limit theorem. Specifically, the 
natural conjecture is that for all c? > one has 



Implicit in this conjecture is the belief that the lower bound (j25p can be improved 
to {2(d+ l)n}i/2 „ o(ni/4), or better. 

So far, the only central limit theorem available for a sequential selection problem 
is that obtained by Bruss and Delbaen (2001; 2004) for a Poissonized version of the 
monotone subsequence problem. Given the sequential nature of the problem, it 
appears to be difficult to de-Poissonize the results of Bruss and Delbaen (2004) to 
obtain conclusions about the distribution of U°''^{tt^) even for d = 0. 

For completeness, we should note that even for the off-line unimodal subsequence 
problem, not much more is known about the random variable J7„ than its asymptotic 
expected value ([T]). Here one might hope to gain some information about the 
distribution of J7„ by the methods of Bollobas and Brightwell (1992) and Bollobas 
and Janson (1997), and it is even feasible — but only remotely so — that one could 
extend the famous distributional results of Baik, Deift and Johansson (1999) to 
unimodal subsequences. More modestly, one certainly should be able to prove that 
the distribution of Un is not asymptotically normal. One motivation for going after 
such a result would be to underline how the restriction to sequential strategies can 
bring one back to the domain of the central limit theorem. 

Acknowledgment: We are grateful to an insightful referee who outlined the proof 
of the bound ((27|) and who suggested the conjecture ((26)) for d = 0. 
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